\(^1\) GenomEast platform, IGBMC
Go back to http://ensembl.org.
Comment – Go back to the “Gene: BBS5” tab and have a look at the colors of transcripts BBS5-201 and BBS5-202 in the genome browser view
Display RNAseq data for breast and skeletal muscle in the genome browser (in BRCA1 gene region).
Using Ensembl/BioMart, retrieve all transcripts IDs and the gene ID of IDH1 gene (human). How many transcripts the gene IDH1 has? Use Ensembl Gene v109, for Human GRCh38.p13:
9 transcripts are found
Extract all exon sequences of the IDH1 gene in fasta format. Headers will contain the Gene names, Transcript stable IDs and Exon stable IDs.
You can leave the Dataset and Filters the same, and go directly to the Attributes section:
Extract all coding sequences of the IDH1 gene in fasta format. Headers will contain the Transcript stable IDs and Exon stable IDs.
You can leave the Dataset and Filters the same, and go directly to the Attributes section:
Retrieve GO-terms associated to the IDH1 gene (select GO Term Name, GO domain and GO Term Accession along with Gene stable ID, Transcript stable ID and Gene Name).
You can leave the Dataset and Filters the same, and go directly to the Attributes section:
Retrieve the germline variations found in this gene. Annotations to be found:
You can leave the Dataset and Filters the same, and go directly to the Attributes section:
We have run an RNA-seq experiment and we have extracted upregulated genes. We are using RNAseq data from :
Strub, T., Giuliano, S., Ye, T., Bonet, C., Keime, C., Kobi, D., Le Gras, S., Cormont, M., Ballotti, R., and Bertolotto, C. (2011). Essential role of microphthalmia transcription factor for DNA replication, mitosis and genomic stability in melanoma. Oncogene 30, 2319–2332.
In this study, they compared the transcriptome in melanoma cell lines between cells with an siRNA against MITF or an siRNA against Luciferase (used as a control). The data are given as a TSV (Tab-Separated Values) file which contain the number of reads per genes. Genes are identified by their Ensembl gene IDs.
Data have been analyzed using the Human genome hg38/GRCh38 - Ensembl v95.
Here are the different column of the file to be analyzed:
Download and uncompress the file simitfvssiluc-up.tsv.zip to extract gene annotations using Ensembl/BioMart for those genes. Use the column Gene ID to extract annotations. Annotations to extract are:
In Ensembl/BioMart, create a new request using an archive of Ensembl v95.
You want to run a de novo motif discovery on all promoters of the upregulated genes (the ones from the file siMitfvssiLuc.up.txt). Extract the promoter sequences of all up-regulated genes: retrieve the 200nt upstream of the transcripts of these genes.
Don’t change Dataset and Filters – simply click on Attributes.
TIPS
Flank (Transcript) will give the flanks for all transcripts of a gene with multiple transcripts. Flank (Gene) will give the flanks for one possible transcript in a gene (the most 5’ coordinates for upstream flanking).Use Biomart of current version of Ensembl.
How many genes are located in the genomic region: 2:208226227-208276270.
In Ensembl/BioMart, create a new request:
Extract the coordinates of all human genes located on chromosomes (exclude scaffolds). Information to extract for each gene:
In Ensembl/BioMart, create a new request
The following is a list of 11 IDs of human proteins from the NCBI RefSeq database:
Generate a list that shows to which Gene stable IDs and to which Gene names these RefSeq IDs correspond. Do these 11 proteins correspond to 11 genes?
Choose the ENSEMBL Genes 109 database.
Choose the Homo sapiens genes (GRCh38.p13) dataset.
Click on Filters in the left panel.
Expand the GENE section by clicking on the + box.
Select ID list limit - RefSeq peptide ID(s) and enter the list of IDs in the text box (either comma separated or as a list).
HINT: You may have to scroll down the menu to see these.
Count shows 3 genes (remember one gene may have multiple splice variants coding for different proteins, that is the reason why these 11 do not correspond to 11 genes).
Click on Attributes in the left panel.
Select the Features attributes page.
Expand the External section by clicking on the + box.
Select Gene name and RefSeq Protein ID from the External References section.
Click the Results button on the toolbar.
Select View All rows as HTML or export all results to a file. Tick the box Unique results only.
Forrest et al performed a microarray analysis of peripheral blood mononuclear cell gene expression in benzene-exposed workers (Environ Health Perspect. 2005 June; 113(6): 801–807). The microarray used was the human Affymetrix U133A/B (also called U133 plus 2) GeneChip. The top 8 up-regulated probe-sets were:
Retrieve for the genes corresponding to these probe-sets the Gene and Transcript stable IDs as well as their Gene names and descriptions.
Choose the ENSEMBL Genes 109 database.
Choose the Homo sapiens genes (GRCh38.p13) dataset.
Click on Filters in the left panel.
Expand the GENE section by clicking on the + box.
Select ID list limit - Affy hg u133 plus 2 probeset ID(s) and enter the list of probeset IDs in the text box (either comma separated or as a list).
Count shows 9 genes match this list of probesets.
Click on Attributes in the left panel.
Select the Features attributes page.
Expand the GENE section by clicking on the + box.
In addition to the default selected attributes, select Description.
Expand the External section by clicking on the + box.
Select Gene name from the External References section and AFFY HG U133-PLUS-2 from the Microarray Attributes section.
Click the Results button on the toolbar.
Select View All rows as HTML or export all results to a file. Tick the box Unique results only.
Your results should show that the 11 probes map to 9 Ensembl genes.
In order to be able to study these human genes in mouse, identify their mouse orthologues. Also retrieve the genomic coordinates of these orthologues.
You can leave the Dataset and Filters the same, and go directly to the Attributes section:
Click on Attributes in the left panel.
Select the Homologs attributes page.
Expand the GENE section by clicking on the + box.
Select Associated Gene Name.
Deselect Ensembl Transcript ID.
Expand the ORTHOLOGS section by clicking on the + box.
Select Mouse Ensembl Gene stable ID, Mouse chromosome/scaffold name, Mouse chromosome/scaffold Start (bp) and Mouse chromosome/scaffold End (bp).
Click the Results button on the toolbar.
Check the box Unique results only. Select View All rows as HTML or export all results to a file.
Your results should show that for most of the human genes has least one mouse orthologue.
Some exercices are taken from Ensembl tutorials.